NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Storage Systems Using Machine Learning

https://doi.org/10.1145/3568429

Akgun, Ibrahim Umit; Aydin, Ali Selman; Burford, Andrew; McNeill, Michael; Arkhangelskiy, Michael; Zadok, Erez (February 2023, ACM Transactions on Storage)

Operating systems include many heuristic algorithms designed to improve overall storage performance and throughput. Because such heuristics cannot work well for all conditions and workloads, system designers resorted to exposing numerous tunable parameters to users—thus burdening users with continually optimizing their own storage systems and applications. Storage systems are usually responsible for most latency in I/O-heavy applications, so even a small latency improvement can be significant. Machine learning (ML) techniques promise to learn patterns, generalize from them, and enable optimal solutions that adapt to changing workloads. We propose that ML solutions become a first-class component in OSs and replace manual heuristics to optimize storage systems dynamically. In this article, we describe our proposed ML architecture, called KML. We developed a prototype KML architecture and applied it to two case studies: optimizing readahead and NFS read-size values. Our experiments show that KML consumes less than 4 KB of dynamic kernel memory, has a CPU overhead smaller than 0.2%, and yet can learn patterns and improve I/O throughput by as much as 2.3× and 15× for two case studies—even for complex, never-seen-before, concurrently running mixed workloads on different storage devices.
more » « less
Full Text Available
Predicting Network Buffer Capacity for BBR Fairness

Akgun, Ibrahim "umit"; Vargas, Santiago; Arkhangelskiy, Michael; Burford, Andrew; McNeill, Michael; Balasubramanian, Aruna; Gandhi, Anshul; Zadok, Erez (December 2022, 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Workshop on ML for Systems)

BBR is a newer TCP congestion control algorithm with promising features, but it can often be unfair to existing loss-based congestion-control algorithms. This is because BBR's sending rate is dictated by static parameters that do not adapt well to dynamic and diverse network conditions. In this work, we introduce BBR-ML, an in-kernel ML-based tuning system for BBR, designed to improve fairness when in competition with loss-based congestion control. To build BBR-ML, we discretized the network condition search space and trained a model on 2,500 different network conditions. We then modified BBR to run an in-kernel model to predict network buffer sizes, and then use this prediction for optimal parameter settings. Our preliminary evaluation results show that BBR-ML can improve fairness when in competition with Cubic by up to 30% in some cases.
more » « less
Full Text Available
A Machine Learning Framework to Improve Storage System Performance

https://doi.org/10.1145/3465332.3470875

Akgun, Ibrahim Umit; Aydin, Ali Selman; Shaikh, Aadil; Velikov, Lukas; Zadok, Erez (July 2021, Proceedings of the 13th ACM Workshop on Hot Topics in Storage (HotStorage '21))
null (Ed.)
Storage systems and their OS components are designed to accommodate a wide variety of applications and dynamic workloads. Storage components inside the OS contain various heuristic algorithms to provide high performance and adaptability for different workloads. These heuristics may be tunable via parameters, and some system calls allow users to optimize their system performance. These parameters are often predetermined based on experiments with limited applications and hardware. Thus, storage systems often run with these predetermined and possibly suboptimal values. Tuning these parameters manually is impractical: one needs an adaptive, intelligent system to handle dynamic and complex workloads. Machine learning (ML) techniques are capable of recognizing patterns, abstracting them, and making predictions on new data. ML can be a key component to optimize and adapt storage systems. In this position paper, we propose KML, an ML framework for storage systems. We implemented a prototype and demonstrated its capabilities on the well-known problem of tuning optimal readahead values. Our results show that KML has a small memory footprint, introduces negligible overhead, and yet enhances throughput by as much as 2.3x.
more » « less
Full Text Available
Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying

https://doi.org/10.1145/3383669.3398276

Akgun, Ibrahim Umit; Kuenning, Geoff; Zadok, Erez (May 2020, SYSTOR '20: Proceedings of the 13th ACM International Systems and Storage Conference)

Modern applications use storage systems in complex and often surprising ways. Tracing system calls is a common approach to understanding applications' behavior, allowing offline analysis and enabling replay in other environments. But current system-call tracing tools have drawbacks: (1) they often omit some information---such as raw data buffers---needed for full analysis; (2) they have high overheads; (3) they often use non-portable trace formats; and (4) they may not offer useful and scalable analysis and replay tools. We have developed Re-Animator, a powerful system-call tracing tool that focuses on storage-related calls and collects maximal information, capturing complete data buffers and writing all traces in the standard DataSeries format. We also created a prototype replayer that focuses on calls related to file-system state. We evaluated our system on long-running server applications such as key-value stores and databases. Our tracer has an average overhead of only 1.8-2.3×, but the overhead can be as low as 5% for I/O-bound applications. Our replayer verifies that its actions are correct, and faithfully reproduces the logical file system state generated by the original application.
more » « less
Full Text Available

Search for: All records